An improved Gibbs sampling method for motif discovery via sequence weighting.
نویسندگان
چکیده
The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co-expressed genes. In this paper, we present an enhancement to the Gibbs sampling method when the expression data of the concerned genes is given. A sequence weighting scheme is proposed by explicitly taking gene expression variation into account in Gibbs sampling. That is, every putative motif element is assigned a weight proportional to the fold change in the expression level of its downstream gene under a single experimental condition, and a position specific scoring matrix (PSSM) is estimated from these weighted putative motif elements. Such an estimated PSSM might represent a more accurate motif model since motif elements with dramatic fold changes in gene expression are more likely to represent true motifs. This weighted Gibbs sampling method has been implemented and successfully tested on both simulated and biological sequence data. Our experimental results demonstrate that the use of sequence weighting has a profound impact on the performance of a Gibbs motif sampling algorithm.
منابع مشابه
Info-gibbs: a Motif Discovery Algorithm That Directly Optimizes Information Content during Sampling
MOTIVATION Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics r...
متن کاملA Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery
As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related...
متن کاملConvergence Rate of Markov Chain Methods for Genomic Motif Discovery
We analyze the convergence rate of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Sp...
متن کاملA Comparison Of Expectation Maximization and Gibbs Sampling Strategies for Motif Finding
A set of protein or nucleotide sequences may be found to share patterns reflecting biological structure, function and change. The task of identifying these patterns, known as motif finding, can be viewed as an instance of multiple sequence alignment. While it is possible to identify motifs using x-ray and magnetic resonance structures, biologists and computer scientists have developed several a...
متن کاملMoitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by Genetic Algorithm
Identification of transcription factor binding sites (TFBSs) or motifs plays an important role in deciphering the mechanisms of gene regulation. Although many experimental and computational methods have been developed, finding TFBSs remains a challenging problem. We propose and develop a novel sampling based motif finding method coupled with PSFM optimization by genetic algorithm, which we call...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational systems bioinformatics. Computational Systems Bioinformatics Conference
دوره شماره
صفحات -
تاریخ انتشار 2006